Neural networks and hand-written image recognition

Author: Leonardo Espin

Date: 1/10/2019

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.image as mpimg #to work with raster images
%matplotlib inline
  • The dataframe XDF contains image data that has been flattened to vectors. It contains 5000 training examples of images of 20x20 pixels (a vector of 400 elements)
  • The dataframe yDF contains the labels for each image. The images corresponding to zero, have been labeled as 10
  • A neural network with a single hidden layer, with 25 activation units has been trained to classify the images according to the label. The weights of the hidden layer $\theta_{1,0},\dots,\theta_{1,400},\theta_{2,0},\dots,\dots,\theta_{25,400}$ are in the dataframe theta1DF, and the weights of the output layer are in theta2DF

This is a subset of the MNIST handwritten digit dataset

In [2]:
XDF=pd.read_csv('ex3data1-X.csv',header=None)
yDF=pd.read_csv('ex3data1-y.csv',header=None)
theta1DF=pd.read_csv('ex3weights-T1.csv',header=None)
theta2DF=pd.read_csv('ex3weights-T2.csv',header=None)
In [3]:
print(XDF.shape)
XDF.head()
(5000, 400)
Out[3]:
0 1 2 3 4 5 6 7 8 9 ... 390 391 392 393 394 395 396 397 398 399
0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
1 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
2 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
3 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
4 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0

5 rows × 400 columns

In [4]:
print(theta1DF.shape)
print(theta2DF.shape)
theta2DF.head()
(25, 401)
(10, 26)
Out[4]:
0 1 2 3 4 5 6 7 8 9 ... 16 17 18 19 20 21 22 23 24 25
0 -0.76100 -1.21240 -0.10187 -2.36850 -1.05780 -2.208200 0.56384 1.21110 2.21030 0.44456 ... -0.23366 -1.5201 1.15320 0.10368 -0.37208 -0.61530 -0.12568 -2.271900 -0.71836 -1.29690
1 -0.61785 0.61559 -1.26550 1.85750 -0.91853 -0.055026 -0.38590 1.29520 -1.56840 -0.97026 ... -2.44170 -0.8563 -0.29826 -2.07950 -1.29330 0.89982 0.28307 2.311800 -2.46440 1.45660
2 -0.68934 -1.94540 2.01360 -3.12320 -0.23618 1.386800 0.90982 -1.54770 -0.79831 -0.65600 ... -1.63900 1.2027 -1.20250 -1.83450 -1.88010 -0.34056 0.23692 -1.061400 1.02760 -0.47691
3 -0.67832 0.46299 0.58492 -0.16502 1.93260 -0.229660 -1.84730 0.49012 1.07150 -3.31910 ... -0.68428 -1.6471 0.21153 -0.27422 1.72600 1.32420 -2.63980 -0.080559 -2.03510 -1.46120
4 -0.59664 -2.04480 2.05700 1.95100 0.17638 -2.161400 -0.40395 1.80160 -1.56280 -0.25253 ... -0.67489 1.1407 1.32430 3.21160 -2.15890 -2.60160 -3.22260 -1.896100 -0.87488 2.51040

5 rows × 26 columns

The images correspond to hand drawn digits (0 to 9), and they can be shown with the imshow command:

In [5]:
tmp=XDF.iloc[0,:].values.reshape(20,20)
plt.imshow(tmp);

Below I show a mozaic of a 100 images selected at random from the training samples (notice that the bitmap arrays have to be transposed to be shown correctly):

In [6]:
import random
#select a 100 images (rows) at random
selection=[random.randint(0,5000) for x in range(100)];

image=np.zeros((20*10, 20*10)) #for constructing a mozaic of 10x10 images
coords=[(x,y) for x in range(1,11) for y in range(1,11)];
for k,tup in enumerate(coords):
    indYa=0+20*(tup[0]-1)
    indYb=19+20*(tup[0]-1)
    indXa=0+20*(tup[1]-1)
    indXb=19+20*(tup[1]-1)
    image[indYa:indYb+1,indXa:indXb+1]=XDF.iloc[selection[k],:].values.reshape(20,20).transpose()

plt.figure(figsize=(8,8))
plt.imshow(image);

The structure of the trained neural network is shown below

image.WPGGVZ.png

Below I apply the neural network to the training set in XDF. Note that colums or rows of ones have to be added to the flattened images to account for the bias units

In [7]:
#add a column (axis=1) of ones (3rd argument) to the image data (1st argument) at 
#the beggining of the matrix (2nd argument)
X=np.insert(XDF.values, 0, 1, axis=1)
X[0:5,0:15]
Out[7]:
array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

Multiplying each image by the hidden layer weights to obtain the $z$ values:

In [8]:
Z2=np.matmul(theta1DF.values,X.transpose())
print(Z2.shape)
(25, 5000)

The values $a^{(2)}_i$, $i=1,\dots,25$ are obtained by applying the sigmoid function (notice that the function is vectorized and applied simultaneously to the whole Z2 matrix):

In [9]:
def g(z):
    return 1/(1+np.exp(-z))

g = np.vectorize(g)
In [10]:
A2=g(Z2)
print(A2.shape)
A2[0:3,0:5]
(25, 5000)
Out[10]:
array([[0.0503631 , 0.00805789, 0.01419626, 0.04589353, 0.00188314],
       [0.07939735, 0.05105084, 0.02443577, 0.00361582, 0.07777682],
       [0.99300211, 0.93367502, 0.99751863, 0.99569588, 0.98854938]])

a row of ones is added to the matrix A2 to account for the bias unit

In [11]:
A2=np.insert(A2, 0, 1, axis=0)
A2[0:4,0:5]
Out[11]:
array([[1.        , 1.        , 1.        , 1.        , 1.        ],
       [0.0503631 , 0.00805789, 0.01419626, 0.04589353, 0.00188314],
       [0.07939735, 0.05105084, 0.02443577, 0.00361582, 0.07777682],
       [0.99300211, 0.93367502, 0.99751863, 0.99569588, 0.98854938]])

Below are the calculations for the output layer, which has 10 nodes corresponding to the 10 categories of the hand-written symbols

In [12]:
Z3=np.matmul(theta2DF.values,A2)
A3=g(Z3)
print(A3.shape)
print('clasification results of first 4 images (choose max value per column):')
A3[0:10,0:4]
(10, 5000)
clasification results of first 4 images (choose max value per column):
Out[12]:
array([[1.12671887e-04, 4.79056232e-04, 8.85776815e-05, 5.57383895e-05],
       [1.74148916e-03, 2.41533708e-03, 3.24329935e-03, 8.05061830e-03],
       [2.52662295e-03, 3.44724408e-03, 2.55394811e-02, 1.78282512e-02],
       [1.84041460e-05, 4.05623351e-05, 2.13624508e-05, 8.65244918e-05],
       [9.36362070e-03, 6.53492371e-03, 3.96943735e-03, 6.42347222e-04],
       [3.99261936e-03, 1.75928814e-03, 1.02875046e-02, 1.14645561e-02],
       [5.51521866e-03, 1.15783184e-02, 3.86827771e-04, 1.85173645e-03],
       [4.01439251e-04, 2.39088978e-03, 6.22854803e-02, 4.69276085e-03],
       [6.48144766e-03, 1.97051630e-03, 5.49898509e-03, 8.21379857e-04],
       [9.95733748e-01, 9.95696578e-01, 9.28004235e-01, 9.94103811e-01]])
In [13]:
classification=np.argmax(A3,axis=0)
classification
Out[13]:
array([9, 9, 9, ..., 8, 8, 8])

Below I show a few classification results chosen at random

In [14]:
import time 

for _ in range(4):
    k=random.randint(0,100)
    k=selection[k]
    print('learned value = {}'.format(classification[k]+1)) 
    tmp=XDF.iloc[k,:].values.reshape(20,20).transpose()
    plt.imshow(tmp,animated=True)
    plt.show()
    time.sleep(1.5)
learned value = 2
learned value = 7
learned value = 10
learned value = 7

The overall classification accuracy is:

In [15]:
accuracy=(100*sum(classification.reshape(yDF.values.shape) == yDF.values-1)
          /len(classification))[0]
print('classification accuracy: {}%'.format(accuracy))
classification accuracy: 97.52%

Using scikit-learn to train the classification Neural Network

In [16]:
from sklearn.neural_network import MLPClassifier

#the solver chosen below works better with small sets. otherwise use 
#stochastic gradient descent or other options
clf = MLPClassifier(solver='adam',              #adam is the default
                    alpha=1e-3,                 #reg. parameter lambda =1/(2*500)
                    hidden_layer_sizes=(25,),   #1 hidden layer with 25 units
                    activation='logistic',      #the logistic sigmoid function, could change to relu
                    #max_iter=400,
                    validation_fraction=0.2) 
In [17]:
clf.fit(XDF.values, yDF.values.flatten())#flatten reshapes the 5000x1 col matrix to 1D array
                                         #otherwise sklearn complaints
/home/perro/anaconda3/lib/python3.6/site-packages/sklearn/neural_network/multilayer_perceptron.py:566: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
  % self.max_iter, ConvergenceWarning)
Out[17]:
MLPClassifier(activation='logistic', alpha=0.001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(25,), learning_rate='constant',
              learning_rate_init=0.001, max_iter=200, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=None, shuffle=True, solver='adam', tol=0.0001,
              validation_fraction=0.2, verbose=False, warm_start=False)

The learned coefficients are below. Notice that we obtain a very high classification accuracy because in this example we are training with just 5000 images (8% of entire dataset), so the model is most likely overfitting.

In [18]:
T1=clf.coefs_[0]
T2=clf.coefs_[1]
print(T1.shape)
print(T2.shape)
print('theta_0 coefficients:')
print(clf.intercepts_[0].shape)
print(clf.intercepts_[1].shape)
(400, 25)
(25, 10)
theta_0 coefficients:
(25,)
(10,)
In [19]:
accuracy=(100*sum(clf.predict(XDF.values).reshape(yDF.values.shape) == yDF.values)
          /len(classification))[0]
print('classification accuracy: {}%'.format(accuracy))
classification accuracy: 98.88%
In [20]:
#or more easily
clf.score(XDF.values, yDF.values)
Out[20]:
0.9888

A Keras DNN trained on the entire dataset can be seen here: Keras and the MNIST dataset

In [ ]: